AITopics | state-action space

For continuous state-action space scenarios, classical reinforcement learning (RL) theory predominantly focuses on low-rank Markov decision processes (MDPs), which provide sample-efficient guarantees at the expense of restrictive structural assumptions. Kernel smoothing model-based approaches offer a promising alternative paradigm that instead leverages the smoothness of the MDP and employs non-parametric kernel smoothing estimates of transition dynamics. This paper proposes a new kernel-smoothing model-based approach for online reinforcement learning in finite-horizon settings under Lipschitz continuity assumptions on the MDP. By incorporating a Bernstein-style exploration bonus into the kernel smoothing framework, our method achieves a regret bound which improves upon the state-of-the-art regret bound in its dependence on the horizon. The theoretical advancement relies on a delicate analysis of the synergy between Bernstein-style bonuses and kernel smoothing, where a new tight Bernstein-type concentration inequality for martingales may be of independent interest.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

2605.07218

Country: Asia (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

437d46a857214c997956eaf0e3b21a55-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 15:32:06 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Robots (0.93)

Add feedback

Local Linearity: the Key for No-regret Reinforcement Learning in Continuous MDPs

Neural Information Processing SystemsFeb-16-2026, 11:46:14 GMT

Existing solutions either work under very specific assumptions or achieve bounds that are vacuous in some regimes.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Lombardy > Milan (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

We introduce a novel framework for analyzing reinforcement learning (RL) in continuous state-action spaces, and use it to prove fast rates of convergence in both

Neural Information Processing SystemsFeb-16-2026, 06:18:22 GMT

We argue that these properties are satisfied in many continuous state-action Markov decision processes.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.67)
Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

437d46a857214c997956eaf0e3b21a55-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 09:47:15 GMT

algorithm, mdp, proximity, (14 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)

Add feedback

Appendix: OntheExpressivityofMarkovReward

Neural Information Processing SystemsFeb-8-2026, 08:57:26 GMT

Instead, wesuggest that foragivenCMP,it is natural to be interested in Markov rewards, but acknowledge the importance of going beyond such functions.

artificial intelligence, constraint, reward function, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.57)

Add feedback

POLY-HOOT: Monte-CarloPlanninginContinuous SpaceMDPswithNon-AsymptoticAnalysis

Neural Information Processing SystemsFeb-8-2026, 00:03:03 GMT

Inthis paper, we consider Monte-Carlo planning in an environment with continuous state-action spaces, amuchlessunderstood problem withimportant applications in control and robotics.

algorithm, artificial intelligence, planning & scheduling, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.89)

Add feedback

We thank the reviewers for the comments and constructive feedback and we are delighted that they appreciated the

Neural Information Processing SystemsFeb-7-2026, 19:34:23 GMT

RL-as-inference (see discussion in Section 4.3), they differ crucially in how the objective is interpreted.

artificial intelligence, comment and constructive feedback, practical algorithm, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.73)

Add feedback

Kernelized Reinforcement Learning with Order Optimal Regret Bounds

Neural Information Processing SystemsDec-23-2025, 21:33:04 GMT

Modern reinforcement learning (RL) has shown empirical success in various real world settings with complex models and large state-action spaces. The existing analytical results, however, typically focus on settings with a small number of state-actions or simple models such as linearly modeled state-action value functions. To derive RL policies that efficiently handle large state-action spaces with more general value functions, some recent works have considered nonlinear function approximation using kernel ridge regression. We propose $\pi$-KRVI, an optimistic modification of least-squares value iteration, when the action-value function is represented by an RKHS. We prove the first order-optimal regret guarantees under a general setting. Our results show a significant polynomial in the number of episodes improvement over the state of the art. In particular, with highly non-smooth kernels (such as Neural Tangent kernel or some Matérn kernels) the existing results lead to trivial (superlinear in the number of episodes) regret bounds. We show a sublinear regret bound that is order optimal in the cases where a lower bound on regret is known (which includes the kernels mentioned above).

kernelized reinforcement learning, name change, order optimal regret bound, (4 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback